Data Engg. Vol 2 - Data Processing & Transformation (Batch & Stream Processing)

starstarstarstarstar 5.0 (1 ratings)

Created by Soumyadeep Dey

  • English

About the course

This is Volume 2 of Data Engineering course. In this course I will talk about Open Source Data Processing technologies -  Spark and Kafka, which are the most used and most popular data processing frameworks for Batch & Stream Processing. In this course you will learn Spark from Level 100 to Level 400 with real-life hands on and projects. You will get introduced to Data Lake on AWS (that is S3) & Data Lakehouse using Apache Iceberg.

AWS will be used as the hosting platform and I will talk about AWS Services - EMR, S3 and MSK. I will cover Databricks as Spark hosting platform. I will also show you Spark integration with other services like AWS RDS (MySQL or PostgreSQL) and Redshift.

You will get opportunities to do hands-on using large datasets (100 GB - 300 GB or more of data). This course will provide you hands-on exercises that match with real-time scenarios like Spark batch processing, stream processing, performance tuning, streaming ingestion, Window functions, ACID transactions on Iceberg etc. 

Some other highlights:

  • 10 Projects with different datasets. Total dataset size of 250 GB or more.
  • Other technologies covered - EC2, EBS, VPC and IAM.
  • AWS Lambda for data processing and Apache Airflow for Data Pipeline Orchestration.
  • Optional Python videos
  • Optional AWS and SQL Essentials videos

Please provide feedback and suggestions if you want me to add any other topics.

Course Curriculum

What do we offer

Live learning

Live sessions (6-8 hrs) every week. Link of the sessions will be shared once you enrol. Recording for the live sessions will be made available to all learners.

Structured learning

Our curriculum is designed to take you from Beginner level to Expert level using production size datasets and production like scenarios for all courses.

Community & Networking

Interact and network with like-minded folks from various backgrounds in Live Sessions.

Learn with the best

Stuck on something? Discuss it with your peers and the instructors in the inbuilt chat groups.

Production Like Projects

Each course contains minimum of 8 to 10 projects with min of 150 - 200 GB of datasets. It is highly recommended to complete all the projects to get understanding of real life scenarios.

Get certified

Flaunt your skills with course certificates. You can showcase the certificates on LinkedIn with a click.

Reviews

Enroll Now